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(57) ABSTRACT 

A database or some portion thereof is partitioned into 
ownership groups. Each ownership group is assigned one or 
more database servers as owners of the ownership group. 
The database servers that are assigned as owners of an 
ownership group are treated as the owners of all data items 
that belong to the ownership group. That is, they are allowed 
to directly access the data items within the ownership group, 
while other database servers are not allowed to directly 
access those data items. A mechanism is provided for 
transitioning ownership of a data item. Ownership is trans- 
ferred by disabling access to the data item, waiting for all 
transactions that have made changes to the data item to 
either commit or abort, changing data that indicates owner- 
ship of the data item from a first owner to a second owner, 
and enabling access to the data item. 
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TRANSITIONING OWNERSHIP OF DATA workload conditions. For example, all of the data to be 

ITEMS BETWEEN OWNERSHIP GROUPS accessed during a particular work granule may reside on the 

disks of a particular node. Consequently, only processes 
FIELD OF THE INVENTION running within that node can be used to perform the work 

granule, even though processes on other nodes remain idle. 



[The present invention relates to database is ystemsl arid, „ „ . 

iiiimfflp Shared nothing systems provide compartmentahzation of 

- ^^ M0l%iimP ' -— — — - y software failures resulting in memory and/or disk corrup- 

J*vV^\ : _ /_J don. The only exceptions are the control blocks controlling 

BACKGROUND OF THE INVENTION "ownership" of data subsets by different nodes. Ownership 

10 is much more rarely modified than shared disk lock man- 
Multi-processing computer systems are systems that agement information. Hence, the ownership techniques are 
include multiple processing units that are able to execute simpler and more reliable than the shared disk lock man- 
instructions in parallel relative to each other. To take advan- agement techniques, because they do not have high perfor- 
tage of parallel processing capabilities, different aspects of mance requirements. 

a task may be assigned to different processing units. The is Databases that run on multi-processing systems typically 
different aspects of a task are referred to herein as work f a u into two categories: shared disk databases and shared 
granules, and the process responsible for distributing the nothing databases. Shared disk database systems in which 
work granules among the available processing units is multiple database servers (typically running on different 
referred to as a coordinator process. nodes) are capable of reading and writing to any part of the 

Multi-processing computer systems typically fall into 20 database. Data access in the shared disk architecture is 
three categories: shared everything systems, shared disk coordinated via a distributed lock manager. Shared disk 
systems, and shared nothing systems. The constraints placed databases may be run on both shared nothing and shared disk 
on the distribution of work to processes performing granules computer systems. To run a shared disk database on a shared 
of work vary based on the type of multi-processing system nothing computer system, software support may be added to 
involved, 25 the operating system or additional hardware may be pro- 

In shared everything systems, processes on all processors vided to allow processes to have direct access to remote 
have direct access to all dynamic memory devices disks. 

(hereinafter generally referred to as "memory") and to all A shared nothing database assumes that a process can 

static memory devices (hereinafter generally referred to as 3Q only directly access data if the data is contained on a disk 
"disks'*) in the system. Consequently, in a shared everything that belongs to the same node as the process. Specifically, 
system there are few constraints with respect to how work the database data is subdivided among the available database 
granules may be assigned. However, a high degree of wiring servers. Each database server can directly read and write 
between the various computer components is required to only the portion of data owned by that database server. If a 
provide shared everything functionality. In addition, there first server seeks to access data owned by a second server, 
are scalability limits to shared everything architectures. then the first database server must send messages to the 

la shared disk systems, processors and memories are second database server to cause the second database server 
grouped into nodes. Each node in a shared disk system may to perfonn the data access on its behalf, 
itself constitute a shared everything system that includes Shared nothing databases may be run on both shared disk 
multiple processors and multiple memories. Processes on all ^ and shared nothing multi-processing systems. To run a 
processors can access all disks in the system, but only the shared nothing database on a shared disk machine, a soft- 
processes on processors that belong to a particular node can ware mechanism may be provided for logically partitioning 
directly access the memory within the particular node. the database, and assigning ownership of each partition to a 
Shared disk systems generally require less wiring than particular node. 

shared everything systems. However, shared disk systems 45 Shared nothing and shared disk systems each have favor- 
are more susceptible to unbalanced workload conditions. able advantages associated with its particular architecture. 
For example, if a node has a process that is working on a For example, shared nothing databases provide better per- 
work granule that requires large amounts of dynamic formance if there are frequent write accesses (write hot 
memory, the memory that belongs to the node may not be spots) to the data. Shared disk databases provide better 
large enough to simultaneously store all required data. 50 performance if there are frequent read accesses (read hot 
Consequently, the process may have to swap data into and spots). Also, as mentioned above, shared nothing systems 
out of its node's local memory even though large amounts provide better fault containment in the presence of software 
of memory remain available and unused in other nodes. failures. 

Shared disk systems provide compartmentalization of In light of the foregoing, it would be desirable to provide 
software failures resulting in memory corruption. The only 55 a single database system that is able to provide the perfor- 
exceptions are the control blocks used by the inter-node lock mance advantages of both types of database architectures, 
manager, that are virtually replicated in all nodes. Typically, however, these two types of architectures are 

In shared nothing systems, all processors, memories and mutually exclusive. 

disks are grouped into nodes. In shared nothing systems as _ r 

in shared disk systems, each node may itself constitute a 60 , SUMMARY OF THE INVENTION 

shared everything system or a shared disk system. Only the ^ic^r^ j^g to ^ a method is ^ 

processes running on a particular node can directly access -^SjSj&^^° r ^s^itiotn^^^mi^^^V^^&'ixtis^'Om^ 
the memories and disks within the particular node. Of the erenip is transferred by disabling access to the data item, 
three general types of multi-processing systems, shared waiting for all transactions that have made changes to the 
nothing systems typically require the least amount of wiring 65 data item to either commit or abort, changing data that 
between the various system components. However, shared indicates ownership of the data item from a first owner to a 
nothing systems are the most susceptible to unbalanced second owner, and enabling access to the data item. 
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Ownership groups are provided to establish sets of com- thorough understanding of the present invention. It will be 

monly owned data items. When a data item undergoing an apparent, however, to one skilled in the art that the present 

ownership change belongs to an ownership group initially invention may be practiced without these specific details. In 

owned by the first owner, the step of changing data that other instances, well-known structures and devices are 

indicates ownership of the data item from a first owner to a 5 shown in block diagram form in order to avoid unnecessarily 

second owner may be accomplished by changing the owner obscuring the present invention, 

of the ownership group from the first owner to the second Hardware Overview 
owner, or may be accomplished by changing data that 

indicates the ownership group to which the data item FIG - 1 1S a block digram that illustrates a computer 

belongs to reflect that the data item belongs to a second 10 system 100 upon which an embodiment of the invention 

ownership group owned by the second owner. mav 06 implemented, .Computerjvstem 100 includes a bus 

r« m . 4 , .* . c i 102 or other communication mecnanism for communicating 

The process performing the ownership transition may fail. . - , . . , + M - & 

» J. t * r.u • *• l *i_ / information, an d a processor 10 4 coupled with b us 102 for 

According to one aspect of the invention, where the transi- . . ? — "r — ^ - \ ■ t — : — — 

, , . r Al _ processing information. Computer system 100 also includes 

Uon involves changing the owner of an ownership group, the r , & ■■■■ r . , 

j . . c u j * .L ^"-a-mam memory iuo, such as a random access memory 

system responds to such failure by determining whether the 15 /n AWX , • , . , . . ^ ' 

c ; i j u c u . • j - .1 - . ~j - . (RAM) or other dynamic storage device, coupled to bus 102 

process failed before changing the data that indicates own- ; . - ' . . . 

, . - 1 , ,. Tr . * •» j »_ c tor storing lnlormation and instructions to be executed by 

ership of the ownership group. If the process failed before Hxa w * 1fl , , ; j " y 

u • .u j . .u . • a- « u* r *u i_- processor 104. Mam memory 106 also may be used for 

changing the data that mdicates ownership of the ownership \ . . . L1 .... j. . • e 

*u *u c * * « r stonng temporary variables or other intermediate lnforma- 

group, then the first owner is restored as owner of the ^ r • J c • * *• . ■ \ ? L 

. . If 4 . * m j & u • *l -5n hon during execution of instructions to be executed by 

ownership group. If the process failed after changing the 20 f AA _ 4 ^™ r- ^, , , \ 

j . ,t i. • j* 4 u* r*u u- tL processor 104, Computer system 100 further includes a read 

data that indicates ownership of the ownership group, then r , /t»^w\ i^o *i_ . j * 

4 , j • * • j c *u u- onl y memory (ROM) 108 or other static storage device 

the second owner is retained as owner of the ownership * , 4 , J ; M * » ■ * * r ... 

r coupled to bus 102 for storing static information and mstruc- 

™ dons for processor 104. A storage device 110, such as a 

When the transition involves changing the ownership magnetic disk or optical disk, is provided and coupled to bus 

group to which the data item belongs, removing the data ^ lft2 for storin g information and instructions, 

item from its current ownership group involves updating a Computer system 100 may be coupled via bus 102 to a 

first file, and adding the data item to the new ownership display 112, such as a cathode ray tube (CRT), for displaying 

group involves updating a second file. Failure of a process mformation to a computer user . An input device 114, includ- 

that is performing ^the ownership transition is responded to m alphanumeric and other keys, is coupled to bus 102 for 

by determining whether the process performmg me owner- communicating ^formation and command selections to 

ship transition died before the change to the second file If mocS so T 104 . AmAhBt t of ^ m t dcvice ^ carSQT 

the process performing ^ ownership transition died before contfol u6 ^ M a m a of ^ 

the change to the second file, then the data item is restored k fof mmmu ^^ direction ^ mnr 

as a member of the first ownership group. If the process mand ^ions to processor 104 and for controlling cursor 

performing ^the ownership transition died after the change to movement on dis x m , ^ { t device t icall has 

the second file but before the change to the first file then the ^ d of fa ^ ffl m ^ fifSt ^ ^ and 

transition to tibc second ownership group is completed by a ^ ( } ^ aUowg ^ d ^ tQ 

updatmg the first file. • i 

^ 6 positions in a plane. 

BRIEF DESCRIPTION OF THE DRAWINGS 40 ^ e mvent i° n i s related to the use of computer system 100 

for providing a hybrid shared disk/shared nothing database 

The present invention is illustrated by way of example, system. According to one embodiment of the invention, such 

and not by way of limitation, in the figures of the accom- a database system is provided by computer system 100 in 

panying drawings and in which like reference numerals refer response to processor 104 executing one or more sequences 

to similar elements and in which: 45 of one or more instructions contained in main memory 106. 

FIG. 1 is a block diagram of a computer system on which Such instructions may be read into main memory 106 from 

an embodiment of the invention may be implemented; another computer-readable medium, such as storage device 

FIG. 2 is a block diagram of a distributed database system uo - Execution of the sequences of instructions contained in 

that uses ownership groups according to an embodiment of main memory 106 causes processor 104 to perform the 

the invention- 50 pf 00 ^ steps described herein. In alternative embodiments, 

™„ , - \ u -1U # ♦* * <- „r hard-wired circuitry may be used in place of or in combi- 

FIG. 3 is a flowchart illustrating steps for performing an . . - 1 «_ • 

, , . t , *\ 1 . nation with software instructions to implement the inven- 

operation on a data item m a system that supports ownership . ™_ . ,. 4 . r . , A 

J non. Inus, embodiments 01 the invention are not limited to 

groups, specific combination of hardware circuitry and software . 

FIG. 4 is a flowchart illustrating steps for changing the 5S The term ^^.^^ medium » as ^ herein 

owner set of an ownership group according to an embodi- refcrs tQ ^ mcdium ^ partici tes ^ providing i^^c. 

ment of the invention; and ^ ^ processor m for exccutioo . Suc h a medium may 

FIG. 5 is a block diagram that illustrates a technique for many f orms> including but not limited to, non-volatile 

making an atomic change according to an embodiment of media, volatile media, and transmission media. Non-volatile 

the invention. 50 med i a includes, for example, optical or magnetic disks, such 



DETAILED DESCRIPTION OF THE 



as storage device 110. Volatile media includes dynamic 

raEFER^^ ^fT* SUCb r memor y. 106 -Transmission media 

. _ _ _ _ — r includes coaxial cables, copper wire and fiber optics, mclud- 

^netoo^^f^ sfc^kcy ing the wires that comprise bus 102. Transmission media can 



| disi&snared nomihg^ataba^e system is desxxibeo. In" the 65 also take the form of acoustic or light waves, such as those 
' folio wing~description, for the purposes of explanation, generated during radio-wave and infra-red data communi 
numerous specific details are set forth in order to provide a cations. 
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Common forms of computer-readable media include, for computer system 100 may obtain application code in the 

example, a floppy disk, a flexible disk, hard disk, magnetic form of a carrier wave. 

tape, or any other magnetic medium, a CD-ROM, any other jh e hybrid shared disk/shared nothing database system 

optical medium, punchcards, papertape, any other physical described herein is implemented on a computer system for 

medium with patterns of holes, a RAM, a PROM, and 5 which shared disk access to all disks is provided from all 

EPROM, a FLASH-EPROM, any other memory chip or nodes, i.e. is a system that could be used for stricdy shared 

cartridge, a carrier wave as described hereinafter, or any disk access, although according to one aspect of the 

other medium from which a computer can read. invention, access to some "shared nothing" disk data is 

Various forms of computer readable media may be restricted by the software, 

involved in carrying one or more sequences of one or more 10 

instructions to processor 104 for execution. For example, the Ownership Groups 

instructions may initially be carried on a magnetic disk of a A . ... t tlL . , , , 

A ' r«_ J 4 4 , j it _ . A According to an embodiment of the invention, a database 

remote computer. Ine remote computer can load the instruc- , — — A ■ , — ' r-.~. 

. 4 - 4 j . * j ,i ■ . . ■ -(oTsome portion tftereof) is partitioned into ownership 

tions into its dynamic memory and send the instructions over ^ ., .-^ . . 7 - ^ — ^ 

t i . • j * j 1 i* * ig groups, liacn^y wueis hip group is assigned one or more 

a telephone line using a modem. A modem local to computer 15 . 5 . if " & f 

. + nni * . « . ^ . , , database servers as owners or the ownership group. The 

system 100 can receive the data on the telephone line and . ^ n— -z — Jl-2- — r 

. r « . , « . r , aataoase servers that are assigned as owners ot an ownership 

use an infra-red transmitter to convert the data to an infra-red , . , 4 . ^ - „ , t . t iU 4 , . H 

j j . . *j- group are treated as the owners of all data items that belong 

signal. An infra-red detector can receive the data earned in f . r ™ 4 . „ - 4 * 

Ite in&a-red signal and appropriate circuitry can place the to *» °™™ h } p f 0UP ' ^ \ X "* *v ^ 

j . t_ iA^n *M ,i j . 4 . -, n access the data items within the ownership group, while 

data on bus 102. Bus 102 carries the data to mam memory 20 , . . . j . j- „ 

1ft/ - ... AtVA ^. , j ,i other database servers are not allowed to directly access 

106, from which processor 104 retrieves and executes the . . J 

* * T, . 4 t . . j , . those data items, 
mstructions. The instructions received by mam memory 106 

may optionally be stored on storage device 110 either before r According to one embodiment, data items that are fre - 

or after execution by processor 104. quenuy acce ssed logemer are grouped inio the samTowti- 

„ t 1ftA . . , . . . ; s ership gr &tt u, tllUs eusuiing thai they will be Ownfid'bV the 

Computer system 100 also includes a communication ^*fhr>^ ^ i_- n \- 

• . * no im ^ **_c same database^servers. Ownership groups allow operations 
mterface 118 coupled to bus 102. Communication interface . 4 . - , * t * j j » •* t_ * 

no ■ * ^ • i- * / to be performed on a group of related data items by treating 

118 provides a two-way data communication coupling to a \ 4l _ r c . t ^ , / . . ' . „ & 

* i i- 1 n a a. * ■ . . . , , / i i-m \ the group of related data items as an atomic unit. For 

network link 120 that is connected to a local network 122. i , c n ^ . u- 

„ . . . • . r -.-.o i. I example, ownership of all data items within an ownership 

For example, communication interface 118 may be an inte- I ^*j*u * 

, i . i-^i L i rtorxKrv j _, * 3D group may be transfened from a first database server to a 

grated services digital network (ISDN) card or a modem to T j j * l t_ * c i_- c *i_ 

& . , , A ^ • 1 A j / second database server by transferring ownership of the 

provide a data communication connection to a correspond- / r. 4 . i 4 , . . ° * *u j 

r^ii-i-A t l ownership group from the first database server to the second 

mg type of telephone line. As another example, communi- \ ( j ata ( ?asc serv er 

cation interface 118 may be a local area network (LAN) card V 

to provide a data communication connection to a compatible Hybrid Database Svstem 

LAN. Wireless links may also be implemented. In any such 35 

implementation, communication interface 118 sends and FIG. 2 is a block diagram that depicts a hybrid database 

receives electrical, electromagnetic or optical signals that system architecture according to an embodiment of the 

carry digital data streams representing various types of invention. FIG. 2 includes three nodes 202, 204 and 206 on 

informadon. which are executing three database servers 208, 210 and 

Network link 120 typically provides data communication 40 212, respectively. Database servers 208, 210 and 212 are 

through one or more networks to other data devices. For respectively associated with buffer caches 220, 222 and 224. 

example, network fink 120 may provide a connection Each of nodes 202, 204 and 206 are connected to a system 

through local network 122 to a host computer 124 or to data bus 218 that allows database servers 208, 210 and 212 to 

equipment operated by an Internet Service Provider (ISP) 45 directly access data within a database 250 that resides on two 

126. ISP 126 in turn provides data communication services disks 214 and 216 - 

through the world wide packet data communication network The data contained on disks 214 and 216 is logically 
now commonly referred to as the "Internet" 128. Local partitioned into ownership groups 230, 232, 234 and 236. 
network 122 and Internet 128 both use electrical, electro- According to an embodiment of the invention, each owner- 
magnetic or optical signals that carry digital data streams. 5Q ship group includes one or more tablespaces. A tablespace is 
The signals through the various networks and the signals on a collection of one or more datafiles. However, the invention 
network link 120 and through communication interface 118, is not limited to any particular granularity of partitioning, 
which carry the digital data to and from computer system and may be used with ownership groups of greater or lesser 
100, are exemplary forms of carrier waves transporting the scope. 

information. 55 According to one embodiment, each ownership group is 

Computer system 100 can send messages and receive designated as a shared disk ownership group or a shared 

data, including program code, through the network(s), net- nothing ownership group. Each ownership group that is 

work link 120 and communication interface 118. In the designated as a shared nothing ownership group is assigned 

Internet example, a server 130 might transmit a requested one of the available database servers as its owner. In the 

code for an application program through Internet 128, ISP 60 system illustrated in FIG. 2, ownership group 230 is a shared 

126, local network 122 and communication interface 118. In nothing ownership group owned by server 210, ownership 

accordance with the invention, one such downloaded appli- group 232 is a shared disk ownership group, ownership 

cation provides for a hybrid shared disk/shared nothing group 234 is a shared nothing ownership group owned by 

database system as described herein. server 212, and ownership group 236 is a shared nothing 

The received code may be executed by processor 104 as 65 ownership group owned by server 208. 

it is received, and/or stored in storage device 110, or other Because ownership group 230 is a shared nothing own- 

non- volatile storage for later execution. In this manner, ership group owned by server 210, only server 210 is 
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allowed to directly access data (Dl) within ownership group 
230. Any other server that seeks to access data in ownership 
group 230 is normally required to send message requests to 
server 210 that request server 210 to perform the desired 
data access on the requesting server's behalf. Likewise, 5 
ownership groups 234 and 236 are also shared nothing 
ownership groups, and may only be directly accessed by 
their respective owners. 

Since ownership group 232 is a shared disk ownership 
group, any database server may direcdy access the set of 10 
data contained therein. As shown in FIG. 2, each database 
server may contain a copy of this data (D2) within its buffer 
cache. A distributed lock manager is employed to coordinate 
access to the shared data. 

According to one embodiment, the database system 15 
includes a mechanism to dynamically change a particular 
ownership group from shared disk to shared nothing, and 
visa versa. For example, if a particular set of shared nothing 
data is subject to frequent read accesses (read hot spots), 
then that data can be converted to shared disk by converting 20 
the ownership group to which it belongs from shared noth- 
ing to shared disk. Likewise, if a particular set of shared disk 
data is subject to frequent write accesses (write hot spots), 
then that data can be converted to shared nothing data by 
changing the ownership group that contains the data to a 25 
shared nothing ownership group and assigning ownership of 
the ownership group to a database server. 

According to one aspect of the invention, the database 
system also includes a mechanism to reassign ownership of 3Q 
a shared nothing ownership group from one node to another 
node. This may be requested by an operator to improve load 
balancing, or may happen automatically to continue to 
support access to the data of a shared nothing ownership 
group owned by a node Nl after Nl fails. 3S 

Ownership 

As described above, a database system is provided in 
which some ownership groups are designated as shared 
nothing ownership groups, and some ownership groups are $q 
designated as shared disk ownership groups. An owner is 
assigned to every shared nothing ownership group. The 
ownership of a shared nothing ownership group is made 
known to all database servers so that they can send requests 
to the owner of the ownership group when they require tasks 45 
performed on data within the ownership group. 

According to one embodiment of the invention, owner- 
ship information for the various ownership groups is main- 
tained in a control file, and all database servers that have 
access to the database are allowed to access the control file. 50 
Each database server may store a copy of the control file in 
its cache. With a copy of the control file in its cache, a 
database server may determine the ownership of ownership 
groups without always having to incur the overhead asso- 
ciated with reading the ownership information from disk. 55 

FIG. 3 is a flowchart illustrating the steps performed by a 
database server that desires data in a system that employs 
both shared disk and shared nothing ownership groups. In 
step 300, the database server determines the ownership 
group to which the desired data belongs. In step 302, the 60 
database server determines the owner of the ownership 
group that contains the desired data. As explained above, 
step 302 may be performed by accessing a control file, a 
copy of which may be stored in the cache associated with the 
database server. If the ownership group is a shared disk 65 
ownership group, then all database servers are considered to 
be owners of the ownership group. If the ownership group 
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is a shared nothing ownership group, then a specific database 
server will be specified in the control file as the owner of the 
ownership group. 

In step 304, the database server determines whether it is 
the owner of the ownership group that holds the desired data. 
The database server will be the owner of the ownership 
group if either (1) the ownership group is a shared disk 
ownership group, or (2) the ownership group is a shared 
nothing ownership group and the database server is desig- 
nated in the control file as the owner of the shared nothing 
ownership group. If the database server is the owner of the 
ownership group that holds the desired data, control passes 
to step 310, where the database server direcdy retrieves the 
desired data. 

If the database server is not the owner of the ownership 
group that holds the data, control passes to step 306. At step 
306, the database server sends a request to the owner of the 
ownership group for the owner to access the desired data on 
behalf of the requestor. At step 308, the database server 
receives the desired data from the owner of the ownership 
group. 

Owner Sets 

According to an alternative embodiment, an ownership 
group is not limited to being either (1) owned by only one 
database server (shared nothing) or (2) owned by all data- 
base servers (shared disk). Rather, a ownership group may 
alternatively be owned by any specified subset of the avail- 
able database servers. The set of database servers that own 
a particular ownership group are referred to herein as the 
owner set for the ownership group. Thus, a shared nothing 
ownership group is equivalent to a ownership group that 
includes only one database server in its owner set, while a 
shared disk ownership group is equivalent to a ownership 
group that includes all available database servers in its 
owner set. 

When owner sets are used, to perform a task on data in an 
ownership group, a database server that does not belong to 
the owner set of the ownership group sends a request to one 
of the database servers that belong to the owner set of the 
ownership group. In response to the request, the recipient of 
the request directly accesses the data in the ownership group 
and performs the requested task. Contention caused by write 
hot spots within the ownership group only occurs among the 
database servers that belong to the owner set of the owner- 
ship group. 

Changing the Ownership of an Ownership Group 

As mentioned above, it may be desirable to change an 
ownership group from shared nothing to shared disk, or from 
shared disk to shared nothing. Such changes may be initiated 
automatically in response to the detection of read or write 
hot spots, or manually (e.g. in response to a command issued 
by a database administrator). 

Various techniques may be used to transition an owner- 
ship group from one owner set (the "source owner set") to 
the other (the "destination owner set"). FIG. 4 is a flowchart 
that illustrates steps performed for changing the owner set of 
an ownership group according to one embodiment of the 
invention. 

Referring to FIG. 4, at step 400 a "disable change" 
message is broadcast to all of the available database servers. 
The disable change message instructs the database servers to 
cease making forward changes to data within the ownership 
group whose owner set is going to be changed (the "tran- 
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silioning ownership group"). Forward changes are changes to inspect the control file to determine ownership of an 

that create a version that has previously not existed (i.e. ownership group, they retrieve the updated version of the 

create a new "current" version of a data item). Backward control file from persistent storage. Thus they are made 

changes, on the other hand, are changes that result in the aware of the new owner set of the transitioning ownership 

re-creation of a previously existing version of a data item. 5 group. 

At step 402, the portion of the database system respon- A ,. . ~ . . ^ L 

sible for changing the owner set of ownership groups (the Adjusting to Ownership Changes 

"owner changing mechanism") waits until all transactions When a particular query is going to be used frequently, the 

that have made changes to the transitioning ownership group query is typically stored within the database. Most database 

either commit or roll back. Those transactions that have 10 systems generate an execution plan for a stored query at the 

performed some but not all of their updates to data within the time that the stored query is initially submitted to the 

transitioning ownership group prior to step 400 will roll database system, rather than recomputing an execution plan 

back because forward changes to the ownership group are no every time the stored query is used. The execution plan of a 

longer allowed. Because step 400 prevents only forward query must take into account the ownership of the ownership 

changes to the transitioning ownership group, database 15 groups that contain the data accessed by the query. For 

servers are not prevented from rolling back the changes that example, if the query specifies an update to a data item in 

they have already made to the transitioning ownership ownership group owned exclusively by a particular database 

group. server, the execution plan of the query must include shipping 

Unfortunately, a significant amount of overhead may be that update operation to that particular database server, 

required to determine which transactions have updated the 20 However, as explained above, a mechanism is provided 

transitioning ownership group. Therefore, an embodiment of for changing the ownership of ownership groups. Such 

the invention is provided in which the database system does ownership changes may take place after the execution plan 

not attempt to track the transactions that have updated data for a particular stored query has been generated. As a 

within the transitioning ownership group. However, without consequence, execution plans may require certain database 

tracking this information, it must be assumed that any of the 25 servers to perform operations on data within ownership 

transactions that were allowed to access data in the transi- groups that they no longer own. According to one embodi- 

tioning ownership group and that were begun prior to step ment of the invention, database servers that are asked to 

400 may have made changes to data within the transitioning perform operations on data within ownership groups that 

ownership group. ^ they do not own return an "ownership error" message to the 

Based on this assumption, step 402 requires the owner processes that request the operations. In response to receiv- 

changing mechanism to wait until all of the transactions that ing an ownership error message, a new execution plan is 

(1) may have possibly accessed data in the transitioning generated for the query that caused the error. The new 

ownership group, and (2) were begun prior to step 400 either execution plan takes into account the current ownership of 

commit or roll back. Typically, only transactions that are 35 ownership groups, as indicated by the current version of the 

executing in database servers that belong to the source control file, 
owner set of the transitioning ownership group may have 

possibly accessed data in the transitioning ownership group. Control File Management 

Thus, if the transitioning ownership group is shared disk, As described above, an atomic operation is used to update 

then the owner changing mechanism must wait until all ^ the control file to change the designation of an ownership 

transactions in all database servers that were begun prior to group (step 404). Various mechanisms may be used to 

step 400 either commit or roll back. If the transitioning ensure that this operation is atomic. For example, according 

ownership group is shared nothing, then the owner changing to one embodiment of the invention, the control file includes 

mechanism must wait until all transactions in the database a bitmap and a series of block pairs, as illustrated in FIG. 5. 

server that owns the transitioning ownership group either 45 Each bit in the bitmap 512 corresponds to a block pair, 

commit or roll back. Note that this includes user transactions M any gi ven only one of me blocks m a block pair 

that may have originated in other nodes, and have created contains current data. The value of the bit associated with a 

subtransactions local to the transitioning ownership group. block pair indicates which of the two blocks in the corre- 

When all transactions that could possibly have updated spending block pair holds the current data. For example, bit 

data within the transitioning ownership group have either 50 502 is associated with block pair 504 that includes blocks 

committed or aborted, control proceeds to step 404. At step 506 and 508. The value of bit 502 (e.g. "0") indicates that 

404, the owner changing mechanism changes the owner set block 506 is the current block within block pair 504. The 

of the transitioning ownership group by updating the control value of bit 502 may be changed to "1" to indicate that the 

file in an atomic operation. For example, the designation data in block 508 is current (and consequently that the data 

change may cause the transitioning ownership group to 55 in block 506 is no longer valid). 

transition from a shared nothing ownership group to a shared Because the data in the non^current block of a block pair 

disk ownership group or visa versa. Alternatively, the des- jg considered invalid, data may be written into the non- 

ignation change may simply change the database server that current block without changing the effective contents of the 

owns a shared nothing ownership group, without changing control file. The contents of the control file are effectively 

the ownership group type. 60 changed only when the value of a bit in the bitmap 512 is 

After the control file has been changed to reflect the new changed. Thus, as preliminary steps to an atomic change, the 

owner set of the transitioning ownership group, control contents of the current block 506 of a block pair 504 may be 

proceeds to step 406. At step 406, a "refresh cache" message loaded into memory, modified, and stored into the non- 

is sent to all available database servers. Upon receiving the current block 508 of the block pair 504. After these prelimi- 

refresh cache message, each database server invalidates the 65 nary steps have been performed, the change can be atomi- 

copy of the control file that it contains in its cache. cally made by changing the value of the bit 502 within the 

Consequently, when the database servers subsequently need bitmap 512 that corresponds to the block pair 504. 
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This is merely one example of a technique for performing belongs is in the process of being assigned a new owner, 

changes atomically. Other techniques are possible. Thus, the Similarly, a flag may indicate that a data item is in the 

present invention is not limited to any particular technique process of being transferred to a different ownership group, 

for performing changes atomically. \y neD an attempt is made to change the owner set of an 

w . w ~ , . _ 5 ownership group, the ownership change mechanism inspects 

Movmg Data Items Between Ownersmp Groups the status es of tte data items that belong to the owneShip 

One way to change ownership of a data item, such as a group to determine whether any data item that belongs to the 

tablespace, is to change the owner set of the ownership ownership group is in the middle of being transferred to a 

group to which the data item belongs. A second way to different ownership group. If any data item that belongs to 

change ownership of a data item is to reassign the data item 10 the ownership group is in the middle of being transferred to 

to a different ownership group. For example, the owner of a different ownership group, then the attempt to change the 

tablespace A can be changed from server A to server B by owner set of the ownership group is aborted. If no data items 

removing tablespace A from an ownership group assigned to that belong to the ownership group are in the middle of being 

server A and placing it in an ownership group assigned to transferred to a different ownership group, then the status 

server B- 15 flags of the data items that belong to the ownership group are 

According to one embodiment of the invention, the mem- set t0 indicate that the ownership of the ownership group to 

bership of ownership groups is maintained in a data dictio- wnicn data items belong is in transition. A message is 

nary within the database. Consequently, to move a data item aIso to tne various database servers to invalidate their 

from a first ownership group to a second ownership group, cached versions of the control file. This ensures that they see 

the membership information for both the first and second 20 & c new values of status flags. 

ownership groups have to be updated within the data die- When an attempt is made to transfer a data item to a 

tionary. The various steps involved in changing to which different ownership group, the status flags of the data item 

ownership group a data item belongs are similar to those are checked to determine whether the destination ownership 

described above for changing the owner set of an ownership group is in the middle of having its owner set changed, 

group. Specifically, access to the tablespace that is being 25 According to one embodiment, this check is performed after 

transferred (the "transitioning tablespace**) is disabled. The modifying the data dictionary to reflect the new ownership 

ownership change mechanism then waits for all transactions group of the data item, but before updating the control file 

that hold locks on the data item (or a component thereof) to to give the owner of the new ownership group access to the 

either roll back or commit. data item. If the ownership group to which the data item 

Once all of the transactions that bold locks on the data 30 belongs is in the middle of having its owner set changed, 

item have either committed or rolled back, the data dictio- men the status flags for the data item in the control file are 

nary is modified to indicate the new ownership group of the to indicate a "move delayed" condition. In addition, a 

data item. The control file is then modified to indicate that database-wide "move delayed" flag is set to indicate that the 

the owner set of the ownership group to which the data item 35 database contains some data items that are in a move delayed 

was moved is now the owner set of the data item. This state. 

change atomically enables the target owner to access the When the operation of transferring ownership of the 

data item. If the ownership group is in the middle of an transitioning ownership group is completed, the process 

ownership change, the control file is updated to indicate that performing the transfer updates the status flags to indicate 

the data item is in a "moving delayed** state. ^ that the ownership group is no longer in the process of an 

Changing the ownership group to which a data item ownership transfer. In addition, the process clears the "move 

belongs may or may not cause the owner of the data item to delayed" flags of any data items that have moved to this 

change. If the owner set of the source ownership group is the ownership group during the ownership transfer of this 

same as the owner set of the transitioning ownership group, ownership group, 

then the owner of the data item is not changed when the data 4S 

item is moved from the source ownership group to the aiure ecovery 

transitioning ownership group. On the other hand, if the It is possible for a failure to occur while an ownership 

owner set of the source ownership group is not the same as change is in progress. The failure may be the result of a 

the owner set of the transitioning ownership group, then the "process death" or a "server death". A process death occurs 

owner of the data item is changed when the data item is 50 when a particular process involved in the ownership change 

moved from the source ownership group to the transitioning fails. A server death occurs when an entire database server 

ownership group. fails. With both of these failure types, all of the changes that 

have not yet been stored on persistent storage may be lost. 

Specific Ownership Changes Conditions such a failure) it fc necessary t0 return the database to 

According to one embodiment, techniques are provided to 55 a consistent state, 

handle situations in which (1) an attempt is made to change According to one embodiment of the invention, recovery 

the owner set of an ownership group when a data item that from process death is performed through the use of a state 

belongs to the ownership group is in the middle of being object. A state object is a data structure that is allocated in 

transferred to a different ownership group; and (2) an a memory region associated with the database server to 

attempt is made to transfer a data item to a different 60 which the process belongs. Prior to performing an action, the 

ownership group when that destination ownership group is process updates the state object to indicate the action it is 

in the middle of having its owner set changed. going to perform. If the process dies, another process within 

To detect these conditions, an embodiment of the inven- the database server (e.g. a "process monitor") invokes a 

tion provides within the control file one or more status flags method of the state object (a "clean up routine**) to return the 

for each data item (e.g. tablespace) mat belongs to an 65 database to a consistent state. 

ownership group. For example, a flag may be used to The specific acts performed to clean up after a process 

indicate whether the ownership group to which a data item failure depend on what operation the dead process was 
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performing, and how far the dead process had executed During the recovery of the dead server, the ownership of the 
before it died. According to one embodiment, process fail- transitioning ownership group will be reassigned and the 
ures during an ownership change of an ownership group are move delayed flag will be cleared. 



Reducing Downtime During Ownership Change 



handled as follows: 

If the process performing the ownership change dies A , , , , .„ 

before it makes the final control file change, then the original describ « d ibo ™> th u e ste P s 1 " ustrated » .9; 4 reDre - 

owner is restored as the owner of the ownership group. 5M ?. onB ««*«««F" <*«!*»« ownership of an own- 

_ r ership group. In this technique, step 402 requires the own- 

If die process performing the ownershjp change dies after ershi cb mechanism to wait ^ all transactions ^ 

it makes the final control 61e change but before it deletes the made chafl to ^ ^ M tQ ^ 

state object, then the new owner remains the owner, and the ownership group to either commit or roll back. During thb 

state object is deleted. wait, all data in the transitioning ownership group is unavail- 

Process failures that occur while transferring a data item able. Therefore, it is important to minimize the duration of 

from one ownership group to another are handled as follows: the wait. 

If the process performing the transfer dies before the 15 As described above, it may not be practical to track which 

change to the data dictionary, then the original owner of the transactions actually made changes to data that belongs to 

data item will be restored as the owner of the data item. the transitioning ownership group. Therefore, the ownership 

If the process performing the transfer dies after the change mechanism waits for all transactions that are execut- 

changes to the dictionary have been committed, but before ing in all database servers that belong to the source owner set 

the final control file change, then the process monitor 20 of me transitioning ownership group to either commit or roll 

completes the move and performs the appropriate change to back. Due to the number of transactions the ownership 

the control file. If the ownership group is in the middle of an change mechanism must wait upon, many of which may not 

ownership change, the data items are marked as "move have even made changes to data from the transitioning 

delayed". ownership group, the delay may be significant. 

If the process performing the transfer dies after the final 25 According to an alternative embodiment, a mechanism is 

control file change but before the state object is deleted, the provided that allows the data that is being transitioned 

process monitor will delete the state object. between owners to remain available during this delay. 

Specifically, a disable change message is not sent to all 

Server Death database servers. Rather, a "new owner" message is sent to 

30 all database servers indicating the target owner set of the 

While a database server is dead, no access is provided to ownership group. The new owner message may be 

the data in the ownership groups that were owned exclu- broadcast, for example, by sending a refresh cache message 

sively by the dead server. Therefore, according to one to ^ database servers after updating the control file to 

embodiment of the invention, server death is an event that indicate (1) the source owner set, (2) the target owner set, 

triggers an automatic ownership change, where all owner- 35 and (3) mat me owner ship group is in transition, 

ship groups exclusively owned by the failed server are M nuBrttioas started by a server after the server 

assign to new owners. receives the new owner message act as though the target 

The specific acts performed to clean up after a server owner set owns the ownership group. All transactions that 

failure depend on what operation the database server was started in a server before the server receives the new owner 

performing, and how much of an ownership transfer opera- 40 message continue to act as though the source owner set owns 

tion was performed before the server died. According to one the ownership group. Thus, during the waiting period, 

embodiment, server failures during an ownership change of ownership of the transitioning ownership group is effec- 

an ownership group are handled as follows: nve ly shared between the members of the source owner set 

If the source database server dies before the final control and the members of the target owner set. In other words, the 

file change is made, then the ownership group is assigned to 45 data of the transitioning ownership group is temporarily 

another thread, and the status information in the control file shared among two database servers and the shared disk 

is updated to indicate that the ownership group is no longer locking mechanism is temporarily activated for access to 

in transition. such data. 

If the target database server dies, then either (1) the When all of the transactions in the source owner set that 

process performing the transition will detect that the 50 were begun prior to the broadcast of the new owner message 

instance died and abort the transition, or (2) during recovery have either committed or rolled back, the control file is 

of the dead server, the ownership group will be reassigned updated a second time. During the second update, the 

from the dead server to another server. control file is updated to indicate that the target owner set is 

Server failures that occur while transferring a data item ^ exclusive owner set for the ownership group, and that the 

from one ownership group to another are handled as follows: 55 ownership group is no longer in transition. 

If the source server dies before the dictionary change, then Access-type Based Ownership 
during recovery of the server, new owners will be assigned According to one embodiment of the invention, the mem- 
tome source ownership group and roe move fl bership of the owner set of an ownership group differs 
item will be cleared. 60 depending on the type of access being performed. For 

If the source server dies after the dictionary change but example, an ownership group may have one owner set for 

before the final control file change, then during the recovery read accesses, and a different owner set for write accesses, 

of the source server, the move operation will be finished by Similarly, an ownership group may have one owner set for 

either assigning the right owner to the data item, or by forward changes, and a different owner set for backward 

marking it as move delayed. 55 changes and reads. 

If the target server dies and the final control file change is According to one approach, the owner set of an ownership 

made, then the data item is marked as "move delayed". group for read accesses includes all of the available database 
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servers, while the owner set for the same ownership group 
for write accesses is limited to a single database server. 
Thus, with respect to reads the ownership group is accessed 
as a shared disk ownership group, and with respect to writes 
the ownership group is accessed as a shared nothing own- 5 
ership group. 

In the foregoing specification, the invention has been 
described with reference to specific embodiments thereof. It 
will, however, be evident that various modifications and 
changes may be made thereto without departing from the 10 
broader spirit and scope of the invention. The specification 
and drawings are, accordingly, to be regarded in an illus- 
trative rather than a restrictive sense. 

What is claimed is: 

1. A method for transitioning ownership of a data item, the 15 
method comprising the steps of: 

a) disabling access to the data item; 

b) waiting for all transactions that have made changes to 
the data item to either commit or abort; 2Q 

c) if any transactions that made changes to the data item 
abort, then removing all changes to the data item that 
were made before access to the data item was disabled 
by the transactions that abort; 

d) changing data that indicates ownership of the data item 25 
from a first owner to a second owner; and 

e) enabling access to the data item. 

2. The method of claim 1 wherein the step of changing 
data that indicates ownership of the data item is performed 

as an atomic operation. 30 

3. The method of claim 1 wherein: 

the data item belongs to an ownership group initially 

owned by said first owner; and 
the step of changing data that indicates ownership of the 35 

data item from a first owner to a second owner includes 

changing the owner of said ownership group from said 

first owner to said second owner. 

4. The method of claim 1 wherein: 

the data item initially belongs to a first ownership group 49 
owned by said first owner; and 

the step of changing data that indicates ownership of the 
data item from a first owner to a second owner includes 
changing data that indicates the ownership group to 
which said data item belongs to reflect that said data 45 
item belongs to a second ownership group owned by 
said second owner. 

5. The method of claim 3 further including the step of 
responding to a failure of a process that is performing said 
ownership transition by performing the steps of: 50 

determining whether the process failed before changing 
the data that indicates ownership of the ownership 
group; 

if the process failed before changing the data that indi- 
cates ownership of the ownership group, then restoring 55 
the first owner as owner of the ownership group; and 

if the process failed after changing the data that indicates 
ownership of the ownership group, then retaining the 
second owner as owner of the ownership group. 6Q 

6. The method of claim 4 wherein: 

the step of changing the data that indicates the ownership 
group to which said data item belongs is performed by 
changing data in a first file; 

the method further includes the step of updating a second 65 
file to reflect that the data item belongs to the second 
ownership group before changing data in said first file. 
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7. The method of claim 6 further comprising the step of 
responding to a failure of a process that is performing said 
ownership transition by performing the steps of: 

determining whether the process performing the owner- 
ship transition died before the change to the second file; 

if the process performing the ownership transition died 
before the change to the second file, then restoring the 
data item as a member of said first ownership group; 

if the process performing the ownership transition died 
after the change to the second file but before the change 
to the first file, then completing the transition to said 
second ownership group by updating said first file. 

8. The method of claim 6 further comprising the steps of: 
determining whether the second ownership group is 

undergoing an ownership change; and 
if the second ownership group is undergoing an owner- 
ship change, then marking the data item as move 
delayed. 

9. The method of claim 2 wherein the atomic operation 
includes the steps of: 

maintaining a set of related blocks that includes a first 
block and a second block, the first block that storing 
data that indicates that the data item is owned by the 
first owner; 

maintaining at least one flag that corresponds to the set of 
related blocks, said at least one flag indicating that said 
first block is valid and that said second block is not 
valid; 

updating a second block to indicate that the data item is 

owned by said second owner; and 
updating said at least one Sag to indicate that said first 

block is not valid and that said second block is valid. 

10. A method for transitioning to a second owner set the 
ownership of a data item that is initially owned by a first 
owner set, the method comprising the steps of: 

informing a plurality of database servers that the data item 
is in the process of being transitioned from the first 
owner set to the second owner set; 

after informing said plurality of database servers, concur- 
rently allowing both members of said first owner set 
and members of said second owner set to directly 
access said data item; 

detecting when all transactions that are accessing said 
data item through said first owner set have either 
committed or aborted; 

after detecting that all transactions that are accessing said 
data item through said first owner set have either 
committed or aborted, performing the steps of storing 
data that indicates that the second owner set is the 
exclusive owner of the data item; and 

allowing only members of said second owner set to 
directly access the data item. 

11. The method of claim 10 wherein the step of allowing 
both members of said first owner set and members of said 
second owner set to directly access said data item includes 
the steps of: 

allowing processes executing in members of said first 
owner set that had accessed the data item before the 
step of informing to continue to directly access the data 
item after the step of informing; and 

causing all processes that begin after the step of informing 
to access the data item through members of said second 
owner set. 

12. The method of claim 11 wherein the step of detecting 
when all transactions that are accessing said data item 
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through said first owner set have either committed or aborted 
is performed by detecting when all transactions that began 
execution prior to the step of informing have either com- 
mitted or aborted. 

13. A computer-readable medium carrying instructions for 5 
transitioning ownership of a data item, the instructions 
including instructions for performing the steps of: 

a) disabling access to the data item; 

b) waiting for all transactions that have made changes to 



the data item to either commit or abort: 



10 



c) if any transactions that made changes to the data item 
abort, then removing all changes to the data item that 
were made before access to the data item was disabled 
by the transactions that abort; 

d) changing data that indicates ownership of the data item 15 
from a first owner to a second owner; and 

e) enabling access to the data item. 

14. The computer-readable medium of claim 13 wherein 
the step of changing data that indicates ownership of the data 
item is performed as an atomic operation. 20 

15. The computer-readable medium of claim 13 wherein: 
the data item belongs to an ownership group initially 

owned by said first owner; and 
the step of changing data that indicates ownership of the 
data item from a first owner to a second owner includes 25 
changing the owner of said ownership group from said 
first owner to said second owner. 

16. The computer-readable medium of claim 13 wherein: 
the data item initially belongs to a first ownership group 

owned by said first owner; and 30 
the step of changing data that indicates ownership of the 
data item from a first owner to a second owner includes 
changing data that indicates the ownership group to 
which said data item belongs to reflect that said data 
item belongs to a second ownership group owned by 35 
said second owner. 

17. The computer-readable medium of claim 15 further 
including instructions for responding to a failure of a process 
that is performing said ownership transition by performing 
the steps of: 40 

detennining whether the process failed before changing 
the data that indicates ownership of the ownership 
group; 

if the process failed before changing the data that indi- ^ 
cates ownership of the ownership group, then restoring 
the first owner as owner of the ownership group; and 

if the process failed after changing the data that indicates 
ownership of the ownership group, then retaining the 
second owner as owner of the ownership group, SQ 

18. The computer-readable medium of claim 16 wherein: 
the step of changing the data that indicates the ownership 

group to which said data item belongs is performed by 
changing data in a first file; 
the computer-readable medium further includes ins true- ss 
lions for performing the step of updating a second file 
to reflect that the data item belongs to the second 
ownership group before changing data in said first file. 

19. The computer-readable medium of claim 18 further 
comprising instructions for performing the step of respond- gQ 
ing to a failure of a process that is performing said owner- 
ship transition by performing the steps of: 

determining whether the process performing the owner- 
ship transition died before the change to the second file; 

if the process performing the ownership transition died 65 
before the change to the second file, then restoring the 
data item as a member of said first ownership group; 



if the process performing the ownership transition died 
after the change to the second file but before the change 
to the first file, then completing the transition to said 
second ownership group by updating said first file. 

20. The computer-readable medium of claim 18 further 
comprising instructions for performing the steps of: 

determining whether the second ownership group is 
undergoing an ownership change; and 

if the second ownership group is undergoing an owner- 
ship change, then marking the data item as move 
delayed. 

21. The computer-readable medium of claim 14 wherein 
the atomic operation includes the steps of: 

maintaining a set of related blocks that includes a first 
block and a second block, the first block that storing 
data that indicates that the data item is owned by the 
first owner; 

maintaining at least one flag that corresponds to the set of 
related blocks, said at least one flag indicating that said 
first block is valid and that said second block is not 
valid; 

updating a second block to indicate that the data item is 

owned by said second owner; and 
updating said at least one flag to indicate that said first 

block is not valid and that said second block is valid. 

22. Acomputer-readable medium carrying instructions for 
transitioning to a second owner set the ownership of a data 
item that is initially owned by a first owner set, the instruc- 
tions including instruction for performing the steps of: 

informing a plurality of database servers that the data item 
is in the process of being transitioned from the first 
owner set to the second owner set; 

after informing said plurality of database servers, concur- 
rently allowing both members of said first owner set 
and members of said second owner set to directly 
access said data item; 

detecting when all transactions that are accessing said 
data item through said first owner set have either 
committed or aborted; 

after detecting that all transactions that are accessing said 
data item through said first owner set have either 
committed or aborted, performing the steps of storing 
data that indicates that the second owner set is the 
exclusive owner of the data item; and 

allowing only members of said second owner set to 
directly access the data item. 

23. The computer-readable medium of claim 22 wherein 
the step of allowing both members of said first owner set and 
members of said second owner set to directly access said 
data item includes the steps of: 

allowing processes executing in members of said first 
owner set that had accessed the data item before the 
step of informing to continue to directly access the data 
item after the step of informing; and 

causing all processes that begin after the step of informing 
to access the data item through members of said second 
owner set. 

24. The computer-readable medium of claim 23 wherein 
the step of detecting when all transactions that are accessing 
said data item through said first owner set have either 
committed or aborted is performed by detecting when all 
transactions that began execution prior to the step of inform- 
ing have either committed or aborted. 
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